144 research outputs found

    SHIRAZ: an automated histology image annotation system for zebrafish phenomics

    Get PDF
    Histological characterization is used in clinical and research contexts as a highly sensitive method for detecting the morphological features of disease and abnormal gene function. Histology has recently been accepted as a phenotyping method for the forthcoming Zebrafish Phenome Project, a large-scale community effort to characterize the morphological, physiological, and behavioral phenotypes resulting from the mutations in all known genes in the zebrafish genome. In support of this project, we present a novel content-based image retrieval system for the automated annotation of images containing histological abnormalities in the developing eye of the larval zebrafish

    Phenotype Recognition with Combined Features and Random Subspace Classifier Ensemble

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Automated, image based high-content screening is a fundamental tool for discovery in biological science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this paper we investigated an efficient method for the extraction of quantitative features from images by combining second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), which has been extensively used for various image processing applications. The curvelet transform has a more sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges and curves. A combined feature description from Haralick feature and curvelet transform can further increase the accuracy of classification by taking their complementary information. We then investigate the applicability of the random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting.</p> <p>Results</p> <p>Experimental results on the phenotype recognition from three benchmarking image sets including HeLa, CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any individual one in the classification accuracy. The ensemble model produces better classification performance compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble parameters and found that satisfactory performance improvement could be brought by a relative medium dimensionality of feature subsets and small ensemble size.</p> <p>Conclusions</p> <p>The characteristics of curvelet transform of being multiscale and multidirectional suit the description of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a number of commonly applied multi-class classifiers in the investigated application of phenotype recognition.</p

    A systematic, large-scale comparison of transcription factor binding site models

    Get PDF
    Background The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified “real” in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. Results While the area- under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Conclusions Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites

    Distinct Functional Constraints Partition Sequence Conservation in a cis-Regulatory Element

    Get PDF
    Different functional constraints contribute to different evolutionary rates across genomes. To understand why some sequences evolve faster than others in a single cis-regulatory locus, we investigated function and evolutionary dynamics of the promoter of the Caenorhabditis elegans unc-47 gene. We found that this promoter consists of two distinct domains. The proximal promoter is conserved and is largely sufficient to direct appropriate spatial expression. The distal promoter displays little if any conservation between several closely related nematodes. Despite this divergence, sequences from all species confer robustness of expression, arguing that this function does not require substantial sequence conservation. We showed that even unrelated sequences have the ability to promote robust expression. A prominent feature shared by all of these robustness-promoting sequences is an AT-enriched nucleotide composition consistent with nucleosome depletion. Because general sequence composition can be maintained despite sequence turnover, our results explain how different functional constraints can lead to vastly disparate rates of sequence divergence within a promoter

    Widespread Site-Dependent Buffering of Human Regulatory Polymorphism

    Get PDF
    The average individual is expected to harbor thousands of variants within non-coding genomic regions involved in gene regulation. However, it is currently not possible to interpret reliably the functional consequences of genetic variation within any given transcription factor recognition sequence. To address this, we comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a multi-generational pedigree. We localized and quantified CTCF occupancy by ChIP-seq in 12 related and unrelated individuals spanning three generations, followed by comprehensive targeted resequencing of the entire CTCF–binding landscape across all individuals. We identified hundreds of variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein–DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. In the significant majority of cases buffering was complete, resulting in silent variants spanning every position within the DNA recognition interface irrespective of level of binding energy or evolutionary constraint. The prevalence of complex partial or complete buffering effects severely constrained the ability to predict reliably the impact of variation within any given binding site instance. Surprisingly, 40% of variants that increased CTCF occupancy occurred at positions of human–chimp divergence, challenging the expectation that the vast majority of functional regulatory variants should be deleterious. Our results suggest that, even in the presence of “perfect” genetic information afforded by resequencing and parallel studies in multiple related individuals, genomic site-specific prediction of the consequences of individual variation in regulatory DNA will require systematic coupling with empirical functional genomic measurements

    Tempo and Mode in Evolution of Transcriptional Regulation

    Get PDF
    Perennial questions of evolutionary biology can be applied to gene regulatory systems using the abundance of experimental data addressing gene regulation in a comparative context. What is the tempo (frequency, rate) and mode (way, mechanism) of transcriptional regulatory evolution? Here we synthesize the results of 230 experiments performed on insects and nematodes in which regulatory DNA from one species was used to drive gene expression in another species. General principles of regulatory evolution emerge. Gene regulatory evolution is widespread and accumulates with genetic divergence in both insects and nematodes. Divergence in cis is more common than divergence in trans. Coevolution between cis and trans shows a particular increase over greater evolutionary timespans, especially in sex-specific gene regulation. Despite these generalities, the evolution of gene regulation is gene- and taxon-specific. The congruence of these conclusions with evidence from other types of experiments suggests that general principles are discoverable, and a unified view of the tempo and mode of regulatory evolution may be achievable

    Selective deployment of transcription factor paralogs with submaximal strength facilitates gene regulation in the immune system

    Get PDF
    In multicellular organisms, duplicated genes can diverge through tissue-specific gene expression patterns, as exemplified by highly regulated expression of Runx transcription factor paralogs with apparent functional redundancy. Here we asked what cell type-specific biologies might be supported by the selective expression of Runx paralogs during Langerhans cell and inducible regulatory T cell differentiation. We uncovered functional non-equivalence between Runx paralogs. Selective expression of native paralogs allowed integration of transcription factor activity with extrinsic signals, while non-native paralogs enforced differentiation even in the absence of exogenous inducers. DNA-binding affinity was controlled by divergent amino acids within the otherwise highly conserved RUNT domain, and evolutionary reconstruction suggested convergence of RUNT domain residues towards sub-maximal strength. Hence, the selective expression of gene duplicates in specialized cell types can synergize with the acquisition of functional differences to enable appropriate gene expression, lineage choice and differentiation in the mammalian immune system

    Occupancy maps of 208 chromatin-associated proteins in one human cell type

    Get PDF
    Transcription factors are DNA-binding proteins that have key roles in gene regulation. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP–seq) experiments using the human HepG2 cell line for 208 chromatin-associated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP–seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium

    Transcriptomic alterations in the heart of non-obese type 2 diabetic Goto-Kakizaki rats

    Get PDF
    BACKGROUND: There is a spectacular rise in the global prevalence of type 2 diabetes mellitus (T2DM) due to the worldwide obesity epidemic. However, a significant proportion of T2DM patients are non-obese and they also have an increased risk of cardiovascular diseases. As the Goto-Kakizaki (GK) rat is a well-known model of non-obese T2DM, the goal of this study was to investigate the effect of non-obese T2DM on cardiac alterations of the transcriptome in GK rats. METHODS: Fasting blood glucose, serum insulin and cholesterol levels were measured at 7, 11, and 15 weeks of age in male GK and control rats. Oral glucose tolerance test and pancreatic insulin level measurements were performed at 11 weeks of age. At week 15, total RNA was isolated from the myocardium and assayed by rat oligonucleotide microarray for 41,012 genes, and then expression of selected genes was confirmed by qRT-PCR. Gene ontology and protein-protein network analyses were performed to demonstrate potentially characteristic gene alterations and key genes in non-obese T2DM. RESULTS: Fasting blood glucose, serum insulin and cholesterol levels were significantly increased, glucose tolerance and insulin sensitivity were significantly impaired in GK rats as compared to controls. In hearts of GK rats, 204 genes showed significant up-regulation and 303 genes showed down-regulation as compared to controls according to microarray analysis. Genes with significantly altered expression in the heart due to non-obese T2DM includes functional clusters of metabolism (e.g. Cyp2e1, Akr1b10), signal transduction (e.g. Dpp4, Stat3), receptors and ion channels (e.g. Sln, Chrng), membrane and structural proteins (e.g. Tnni1, Mylk2, Col8a1, Adam33), cell growth and differentiation (e.g. Gpc3, Jund), immune response (e.g. C3, C4a), and others (e.g. Lrp8, Msln, Klkc1, Epn3). Gene ontology analysis revealed several significantly enriched functional inter-relationships between genes influenced by non-obese T2DM. Protein-protein interaction analysis demonstrated that Stat is a potential key gene influenced by non-obese T2DM. CONCLUSIONS: Non-obese T2DM alters cardiac gene expression profile. The altered genes may be involved in the development of cardiac pathologies and could be potential therapeutic targets in non-obese T2DM
    corecore